Improved Translation with Source Syntax Labels
نویسندگان
چکیده
We present a new translation model that include undecorated hierarchical-style phrase rules, decorated source-syntax rules, and partially decorated rules. Results show an increase in translation performance of up to 0.8% BLEU for German–English translation when trained on the news-commentary corpus, using syntactic annotation from a source language parser. We also experimented with annotation from shallow taggers and found this increased performance by 0.5% BLEU.
منابع مشابه
مدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملAutomatically Improved Category Labels for Syntax-Based Statistical Machine Translation
A common modeling choice in syntax-based statistical machine translation is the use of synchronous context-free grammars, or SCFGs. When training a translation model in a supervised setting, an SCFG is extracted from parallel text that has been statistically word-aligned and parsed by monolingual statistical parsers. However, the set of syntactic category labels used in a monolingual statistica...
متن کاملModeling Source Syntax for Neural Machine Translation
Even though a linguistics-free sequence to sequence model in neural machine translation (NMT) has certain capability of implicitly learning syntactic information of source sentences, this paper shows that source syntax can be explicitly incorporated into NMT effectively to provide further improvements. Specifically, we linearize parse trees of source sentences to obtain structural label sequenc...
متن کاملSyntax-Augmented Machine Translation using Syntax-Label Clustering
Recently, syntactic information has helped significantly to improve statistical machine translation. However, the use of syntactic information may have a negative impact on the speed of translation because of the large number of rules, especially when syntax labels are projected from a parser in syntax-augmented machine translation. In this paper, we propose a syntax-label clustering method tha...
متن کاملSyntax-aware Neural Machine Translation Using CCG
Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask tra...
متن کامل